Recognition in Video: Recent Advances in Perceptual Vision Technology Dept. of Computing Science, U. Windsor 31 March 2006 Dr. Dmitry O. Gorodnichy Computational.

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

High Performance Associative Neural Networks: Overview and Library High Performance Associative Neural Networks: Overview and Library Presented at AI06,

By: Mani Baghaei Fard.  During recent years number of moving vehicles in roads and highways has been considerably increased.

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.

DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.

Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.

1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen

Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.

Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor ：王聖智教授 Student ：周節.

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Simple Neural Nets For Pattern Classification

Motion Detection And Analysis Michael Knowles Tuesday 13 th January 2004.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

A Study of Approaches for Object Recognition

Segmentation Divide the image into segments. Each segment:

Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.

Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

CS292 Computational Vision and Language Visual Features - Colour and Texture.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Three-Dimensional Face Recognition Using Surface Space Combinations Thomas Heseltine, Nick Pears, Jim Austin Advanced Computer Architecture Group Department.

A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,

Oral Defense by Sunny Tang 15 Aug 2003

Face Recognition and Retrieval in Video Basic concept of Face Recog. & retrieval And their basic methods. C.S.E. Kwon Min Hyuk.

Radial-Basis Function Networks

Facial Recognition CSE 391 Kris Lord.

Computer vision.

Video Technology for security: Problems and Solutions Dr. Dmitry Gorodnichy Video Recognition Systems* Project Leader Rail and Urban Transit Security Workshop.

Image recognition using analysis of the frequency domain features 1.

Computational Video Group From recognition in brain to recognition in perceptual vision systems. Case study: face in video. Example: identifying computer.

CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.

1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.

Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.

 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.

Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.

Artificial Intelligence Techniques Multilayer Perceptrons.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

National Research Council Canada Conseil national de recherches Canada National Research Council Canada Conseil national de recherches Canada Institute.

Computer Science Department Pacific University Artificial Intelligence -- Computer Vision.

1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.

Face Recognition in Video Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA ’03) Guildford, UK June 9-11, 2003 Dr. Dmitry Gorodnichy.

Recognizing faces in video: Problems and Solutions NATO Workshop on "Enhancing Information Systems Security through Biometrics" October 19, 2004 Dr. Dmitry.

Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.

Autonomous Robots Vision © Manfred Huber 2014.

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

1 Machine Vision. 2 VISION the most powerful sense.

Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

Face Detection Using Neural Network By Kamaljeet Verma ( ) Akshay Ukey ( )

Designing High-Capacity Neural Networks for Storing, Retrieving and Forgetting Patterns in Real-Time Dmitry O. Gorodnichy IMMS, Cybernetics Center of Ukrainian.

SUREILLANCE IN THE DEPARTMENT THROUGH IMAGE PROCESSING F.Y.P. PRESENTATION BY AHMAD IJAZ & UFUK INCE SUPERVISOR: ASSOC. PROF. ERHAN INCE.

Robodog Frontal Facial Recognition AUTHORS GROUP 5: Jing Hu EE ’05 Jessica Pannequin EE ‘05 Chanatip Kitwiwattanachai EE’ 05 DEMO TIMES: Thursday, April.

Face recognition in video as a new biometrics modality & the associative memory framework Talk for IEEE Computation Intelligence Society (Ottawa Chapter)

Face Recognition Technology By Catherine jenni christy.M.sc.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Face Detection 蔡宇軒.

Digital Video Library - Jacky Ma.

Hand Geometry Recognition

Motion Detection And Analysis

FACE DETECTION USING ARTIFICIAL INTELLIGENCE

Final Year Project Presentation --- Magic Paint Face

CSSE463: Image Recognition Day 30

CSSE463: Image Recognition Day 30

CSSE463: Image Recognition Day 30

Towards building user seeing computers

Presentation transcript:

Recognition in Video: Recent Advances in Perceptual Vision Technology Dept. of Computing Science, U. Windsor 31 March 2006 Dr. Dmitry O. Gorodnichy Computational Video Group Institute for Information Technology National Research Council Canada synapse.vit.iit.nrc.ca

2. Recognition in Video Outline Part I. Recognition in Video tutorial (85%) 1.Uniqueness, demand & complexity Specific consideration: faces in video 2.Video recognition in biological vision systems: 1.Lessons on video data acquisition & processing 2.Modeling associative data processing 3.Lets build it 1.General rules 2.Face-oriented applications 3. From pixels to person’s name Part II. Examples of use (15%): NRC Perceptual Vision Technology 1.Hands-free User Interfaces (PUI Nouse) 2.Face recognition at a desktop and on TV 3.Automated Surveillance

3. Recognition in Video 1.Video revolution and how to approach it

4. Recognition in Video A little of history Beginning of XX century: First motion pictures (movies) appeared… –when it became possible to display video frames fast: 12 frames per second Beginning of XXI century: Computerized real-time video processing… – became possible,as it became possible to process video fast: >12 frames per second So what ?

5. Recognition in Video Peculiarities about video recognition Finally, many automated video applications are possible ? (to match or assist human video processing) However: automated video recognition – is not a mere extension of Image or Pattern recognition. – is much harder than what most non-blind people think… Lets find it out!

6. Recognition in Video Recognize: who, where & what ? George, John, Ringo, Paul Watch for problems with: orientation, expression, out-of-focus, occlusion, multi-, illumination, video compression, distortions, interlacing, RESOLUTION

7. Recognition in Video Photos vs Video: Summary Photos: - High spatial resolution - No temporal knowledge E.g. faces: - in controlled environment (similar to fingerprint registration) - “nicely” forced-positioned - 60 pixels between the eyes Video: Low spatial resolution High temporal resolution ( Individual frames of poor quality) Taken in unconstrained environment (in a “hidden” camera setup) - don’t look into camera - even don’t face camera Yet (for humans), video (even of low quality) is often mush more informative and useful than a photograph ! Video requires different approaches!

8. Recognition in Video Pattern recognition vs Video recognition Approaches developed XX century XXI century Pattern Recognition VideoRec Video recognition is a very young area (while pattern recognition is old)

9. Recognition in Video The unique place of this research You are here Computer vision Pattern recognition Neurobiology

10. Recognition in Video 1.b Faces in Video

11. Recognition in Video Current situation: computers fail Face recognition systems performance (from biometrics forums, NATO Biometrics workshop, Ottawa, Oct. 2004) But why ?

12. Recognition in Video The reason ? Different image-based biometrics modalities

13. Recognition in Video In TV & VCD (e.g. in movies): 320x240 Actors’ faces occupying 1/16 – 1/32th of screen Head = pixels i.o.d. = pixels Common face-in-video setups And that’s OK for humans !

14. Recognition in Video 2. Video recognition: How human brain does it. Part 1. Data acquisition

15. Recognition in Video How do we see and recognize? Human brain is a complicated thing … … but a lot is known and can be used.

16. Recognition in Video Lesson 1: order Try to recognize this face Detection Tracking Recognition Colour, motion - for localization Intensity - for recognition

17. Recognition in Video Lesson 2: modalities Pendulums for my daughters (for 1-3 months olds).. disparity Detection by saliency in a) motion, b) colour, c) disparity, d) intensity. These channels are processed independently in brain (Think of a frog catching a fly or a bull running on a torero) Intensity is not raw values, but: frequencies, orientations, gradients.

18. Recognition in Video Lesson 3 & 4 For us: all video data are of low quality (except a fixation point) : Saccadic eye motion, Fovea vision Accumulation over time: We don’t need high res Data are accumulated over time

19. Recognition in Video Lessons 5 & 6 Local processing Non-linear colour perception

20. Recognition in Video Lesson 7 (last) We recognize associatively Associative memory allows us to accumulate (tune) data (knowledge) over time

21. Recognition in Video 2. Video recognition: How human brain does it. Part 2: Data processing

22. Recognition in Video We think associatively ! Data is memorized and recognized in brain associatively What it gives: How to model associative process?

23. Recognition in Video Associative thinking Data is memorized and recognized in brain associatively What it gives: 1. Ability to accumulate data over time 2.Robust recognition (from parts or low quality) Our Task: To model associative process

24. Recognition in Video Associative Memory: at a high level Pavlov’s dogs: Stimulus neuron Response neuron Xi: {+1 or –1} Yj: {+1 or –1} Synaptic strength: -1 < Cij < +1 ?

25. Recognition in Video Associative Memory: on a low level Brain stores information using synapses connecting the neurons. In brain: to interconnected neurons Neurons are either in rest or activated, depending on values of other neurons Yj and the strength of synaptic connections: Yi={+1,-1} Brain is a network of binary neurons evolving in time from initial state: From stimulus (coming from retina) to an attractor, which provides an association to a stimulus (name of a person) Refs: Hebb’49, Little’74,’78, Willshaw’71, …

26. Recognition in Video Three neuro-associative principles Efficient recognition in brain is achieved by 1. non-linear processing 2. massively distributed collective decision making 3. synaptic plasticity a) to accumulate learning data in time by adjusting synapses b) to associate receptor to effector (using thus computed synaptic values)

27. Recognition in Video A Model of memory V ~CV largest attraction basins - largest number of prototypes M stored Main question: How to compute synapses Cij so that a) the desired patterns V m become attractors, i.e. V m ~CV m b) network exhibits best associative (error- correction) properties, i.e. - largest attraction basins - largest number of prototypes M stored? [McCalloch-Pitts‘43, Hebb’49, Amari’71,’77, Hopfield’82,Sejnowski’89, Gorodnichy’97] Attractor-based neural networks

28. Recognition in Video How to update weights Synaptic increment should be a function of 1.Current stimuli pair V = (R,E) 2.What has been already memorized The main question - Which learning rule to use?

29. Recognition in Video High-Performance Associative Neural Network Library Hebb (correlation learning): Generalized Hebb: Widrow-Hoff’s (delta) rule : Projection learning – the best : - it takes into account the relevance of training stimuli and attributes Publicly available at :

30. Recognition in Video Guaranteed to converge. It converges to attractor The behaviour of the network is governed by the energy functions The network always converges: as long as Cij=Cji [ The network always converges: as long as Cij=Cji [Gorodnichy&Reznik’97]

31. Recognition in Video How do stored data look like? The look inside the brain: By looking at the synaptic weights Cij, one can say what is stored: - how many main attractors (stored memories) it has. - how good the retrieval is.

32. Recognition in Video Building from blocks 1. E.g.: Facial memory - modeled as collection of mutually interconnected associative (attractor- based) networks: One for orientation One for gender One for association with voice, etc 2. After one network (of features) reaches an attractor, it passes the information to the next network (of name tags) 3. Neurons responsible for specific tags will fire, passing information to other neural blocks Idea for the “International Brain Station”: To build a distributed (over many computers and many countries) model of real brain (with to neurons) ! Too far-fetched? – Possibly, no.

33. Recognition in Video Possible outcomes S – one effector neuron fires face in a frame is associated with one person: (confident) S or – several neuron fire face is associated with several individuals: (hesitating or confusing) S00 – – none of neurons fire face is not associated with any of the seen faces Final decision is made over several frames:

34. Recognition in Video 3. Lets built it now! a) Overview of computer techniques and principles

35. Recognition in Video What do we want? ? To incorporate all we know into VideoRecLibrary.lib to be used for many applications. A Video Recognition system is built by = the same techniques (.lib) + a-priory task-specific knowledge (main())

36. Recognition in Video Colour space. Perform image analysis in a non-linearly transformed colour space, such as HSV or YCrCb that preserves colour inter-relationship (separates brightness from colour). We use UCS (perceptually uniform colour space), which is a non-linear version of the YCrCb space obtained from the empirical study on psychophysical thresholds of human colour perception

37. Recognition in Video Local importance Global image intensity and colour normalization, (histogram equalization and normalization) is good But local (fovea-driven) image analysis is better. This not only enhances video information, but also provides a way to filter out noise. Includes: –local intensity normalization, –local structure transforms –median filter and morphological dilation and closing. For the same reason, it is often preferable to analyze gradient images rather than absolute-value images.

38. Recognition in Video Collective decision Use collective-based decisions rather than pixel-based. This includes –using support areas for change/motion detection –using higher-order parametrically represented objects (such as lines and rectangles for detection and tracking instead of raw pixels that comprise the objects.)

39. Recognition in Video Accumulation over time. Finally, the temporal advantage of video, which allows one to observe the same scenery or object over a period of time, has to be made use of. Includes: –Histogram-based learning: accumulates values doesn’t learn intra-value relationship –Correlation learning: updates intra-value information learns also local structure information –Other: Bayesian believe / Evidence propagation

40. Recognition in Video Object detection schemes: 1 1/ Extraction from a background –when the object features are not known in advance (e.g. when the objects is of unknown colour or shape) –when the camera is stationary with respect to the background. –commonly used for surveillance applications –Techniques: non-linear change detection techniques, which consider a pixel changed based on the patch around the pixel rather than the pixel itself, statistical background modeling (such as the Mixtures of Gaussians technique) that learns the values of the background pixels over time. second order change detection,

41. Recognition in Video Example Non-linear change detection Second-order change detection [Gorodnichy’03]

42. Recognition in Video Object detection schemes: 2 2/ By recognizing the object dominant features. –when object features (colour, shape, texture) are known –when the camera-scenery setup is such that makes background/foreground computation impossible. –Used for Face and skin detection – skin colour detection with edge merging techniques [Terrillon,Nolker] –hysteresis-like colour segmentation of skin colour [Argyros]

43. Recognition in Video Example

44. Recognition in Video Object detection schemes: 3 3/ Detection by tracking –the past information about object location and size is used to narrow the area of search and to detect the object. –Speeds up the detection of objects –Makes it possible to detect occluded/disappearing objects. Challenging problem: Multi-object tracking – Can be resolved by on-the-fly memorization of objects using associative principles (NRC)

45. Recognition in Video Pianist Hand & Fingers Annotation

46. Recognition in Video 3. Lets built it now! b) Face-oriented applications

47. Recognition in Video Applicability of 160x120 video According to face anthropometrics -studied on BioID database -tested with Perceptual Vision interfaces -observed in cinematography Face size ½ image¼ image 1/8 image 1/16 image In pixels80x8040x4020x2010x10 Between eyes-IOD Eye size Nose size FS  b FD  b- FT  b- FL  b-- FER  b- FC  b- FM / FI  --  – good b – barely applicable - – not good

48. Recognition in Video What can be detected The range of facial orientation and expression tolerated by current face detection technology –can find faces as long as i.o.d >10 pixels –in bad illumination, –with different orientations, –facial expressions If face is too far or badly oriented: –Motion and colour can be used to detect and track.

49. Recognition in Video Anthropometrics of face Surprised to the binary proportions of our faces? 2. IOD. IOD IOD

50. Recognition in Video 3. Lets built it now! c) From pixels to person’s name

51. Recognition in Video From video frame to neural values 1.Face-looking regions are detected using rapid classifiers. 2.They are verified to have skin colour and not to be static. 3.(Face rotation is detected and rotated, eye aligned and)* resampled to 12-pixels-between-the-eyes resolution face is extracted. 4.Extracted face is converted to a binary feature vector* (R= …) 5.This vector is then appended by nametag vector (E= ) In recognition (E= ) 6.Synapses Cij are updated with projection learning* using V = (R,E)

52. Recognition in Video Part II. Examples of use: Perceptual Vision Technology

53. Recognition in Video Computer Vision allows computers to see. Perceptual Vision allows computers to understand what they see ™

54. Recognition in Video Important application #1: Perceptual Vision User Interfaces –Over 1000 of requests for this technology since 2003 –Local partner: SCO Ottawa Health Center

55. Recognition in Video Nouse ™ : what you have already seen

56. Recognition in Video For user with accessibility needs (e.g. residents of the SCO Ottawa Health Center) will benefit the most. But also for other users. Seeing tasks: 1. Where - to see where is a user: {x,y,z…} 2. What - to see what is user doing: {actions} 3. Who - to see who is the user: {names} Our goal: to built systems which can do all three tasks. x y, z  PUI monitor binary event ON OFF recognition / memorization Unknown User! Comprehensive hands-free approach

57. Recognition in Video Nouse: Now with user recognition

58. Recognition in Video How do we test? Using video-sequences! Seeing a face in video sequence, recognize it in another video sequence. Suggested Benchmarks: –IIT-NRC Video-based face database: synapse.vit.iit.nrc.ca/db/video/faces (pairs of facial videos: one for learning, one for testing) –Using TV (VCD) recordings Recognize actors / politicians: from “on the same stage” to “in different stages and years”

59. Recognition in Video Try it yourself - Works with your web-cam or.avi file - Shows “brain model” synapses as watch (in memorization mode) - Shows nametag neurons states as your watch a facial video (in recognition mode)

60. Recognition in Video Pairs of 160x120 low-quality lot- of-variation mpeg-ed videos. Faces occupy 1/4-1/8 of frame (iod = pixels) 11 individuals registered: ID1-10 are memorizes, ID 0 – not IIT-NRC facial database

61. Recognition in Video Annotations of faces on TV Problem: Recognize faces in 160x120 mpeg1 video “Jack, Paul, Steven, Gilles“ 4 guests memorized with 15 secs clips showing face 20 mins video used for recognition 95% recognition

62. Recognition in Video Important application #2: For security – Suits Strategic Initiative of IIT-NRC – Local partners: CBSA (Canada Boarder Services Agency) CPRC (Canadian Police Research Center)

63. Recognition in Video ACE Surveillance ACE stands for Automated Critical Evidence Problem addressed: –Recording space problem –Data management problem Solution – do as much Video Recognition on-line as possible Critical Evidence Snapshot (CES) is defined as a video snapshot that provides to a viewer a piece of information that is both useful and new.

64. Recognition in Video Challenges: –Poor visibility –Poor images (esp. Wireless cameras) –Manageability of data

65. Recognition in Video System architecture: For each video frame, in real-time (online) perform: Detection of object(s) in video using change and foreground detection techniques. Computation of the attributes of the detected object(s): location and size: (x,z,w,h) & their derivatives (changes) in time: d(x,z,w,h) colour (Histogrammed): Pcolour(i,j) texture (Histogrammed): Ptexture(i,j) Future work - Replace Histogram-based learning of objects over time by correlation-based associative memorization - using PINN Library: synaptically tuned associative neural network Recognition of object(s) as either new or already seen, based on its attributes Classifying frame as either CES (i.e providing new information) or not Extracting and creating CES annotations: timestamps, outlines, counter Future work - When face is close, use associative memory to memorize it

66. Recognition in Video Example of a run Automated Critical Evidence Snapshots Extractor (ACESE) compresses 9 hour video into 1 min of annotated summary:

67. Recognition in Video Conclusion Future (promising) trends in research: recognition assisted tracking tracking assisted recognition + in multi-camera setups in industry: Interaction (hands-free), Enhancement (e-presence) Security Live demo: Face tracking and memorization of the audience. Thank you! Questions and discussion welcome!