Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.

Slides:



Advertisements
Similar presentations
Computer Vision - A Modern Approach Set: Recognition by relations Slides by D.A. Forsyth Matching by relations Idea: –find bits, then say object is present.
Advertisements

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Angelo Dalli Department of Intelligent Computing Systems
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Computer Vision - A Modern Approach Set: Probability in segmentation Slides by D.A. Forsyth Missing variable problems In many vision problems, if some.
Face Alignment with Part-Based Modeling
Recognition by finding patterns
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Announcements Final Exam May 13th, 8 am (not my idea).
Computer Vision - A Modern Approach Set: Model-based Vision Slides by D.A. Forsyth Recognition by Hypothesize and Test General idea –Hypothesize object.
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
1 CS6825: Recognition 8. Hidden Markov Models 2 Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events HMMs allow you.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
What is the temporal feature in video sequences?
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Lecture 5 Template matching
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
Recognition of Human Gait From Video Rong Zhang, C. Vogler, and D. Metaxas Computational Biomedicine Imaging and Modeling Center Rutgers University.
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …
Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.
CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …
1 CS6825: Recognition – a sample of Applications.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Computer vision: models, learning and inference
Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.
Graphical models for part of speech tagging
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables.
CS 8690: Computer Vision Ye Duan. CS8690 Computer Vision University of Missouri at Columbia Instructor Ye Duan (209 Engr West)
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Robotic Chapter 8. Artificial IntelligenceChapter 72 Robotic 1) Robotics is the intelligent connection of perception action. 2) A robotic is anything.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Vision Overview  Like all AI: in its infancy  Many methods which work well in specific applications  No universal solution  Classic problem: Recognition.
Computer Vision Set: Object Recognition Slides by C.F. Olson 1 Object Recognition.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Conditional Markov Models: MaxEnt Tagging and MEMMs
1 Robotic Chapter AI & ESChapter 7 Robotic 2 Robotic 1) Robotics is the intelligent connection of perception action. 2) A robotic is anything.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Chapter 21 Robotic Perception and action Chapter 21 Robotic Perception and action Artificial Intelligence ดร. วิภาดา เวทย์ประสิทธิ์ ภาควิชาวิทยาการคอมพิวเตอร์
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models BMI/CS 576
Fitting.
Fitting: The Hough transform
CONTEXT DEPENDENT CLASSIFICATION
Presentation transcript:

Template matching and object recognition

CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc. Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.

CS8690 Computer Vision University of Missouri at Columbia Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins

CS8690 Computer Vision University of Missouri at Columbia Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

CS8690 Computer Vision University of Missouri at Columbia Probabilistic interpretation Write Assume Likelihood of image given pattern Write Assume Likelihood of image given pattern

CS8690 Computer Vision University of Missouri at Columbia Possible alternative strategies Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities

CS8690 Computer Vision University of Missouri at Columbia Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

CS8690 Computer Vision University of Missouri at Columbia Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

CS8690 Computer Vision University of Missouri at Columbia Example Training examples Test image

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

CS8690 Computer Vision University of Missouri at Columbia Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

CS8690 Computer Vision University of Missouri at Columbia Detection This means we compare

CS8690 Computer Vision University of Missouri at Columbia Issues Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize

CS8690 Computer Vision University of Missouri at Columbia Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

CS8690 Computer Vision University of Missouri at Columbia Pruning Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Example: finding people without clothes on –find skin –find extended skin regions –construct groups that pass local classifiers (i.e. lower arm, upper arm) –give these to broader scale classifiers (e.g. girdle)

CS8690 Computer Vision University of Missouri at Columbia Pruning Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries. Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries.

CS8690 Computer Vision University of Missouri at Columbia Horses

CS8690 Computer Vision University of Missouri at Columbia Hidden Markov Models Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example

CS8690 Computer Vision University of Missouri at Columbia Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements

CS8690 Computer Vision University of Missouri at Columbia HMM’s - dynamics

CS8690 Computer Vision University of Missouri at Columbia HMM’s - the Joint and Inference

CS8690 Computer Vision University of Missouri at Columbia Trellises Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log-posterior Trellis makes this easy, as follows.

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Fitting an HMM I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc. I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc.

CS8690 Computer Vision University of Missouri at Columbia HMM’s for sign language understanding-1 Build an HMM for each word

CS8690 Computer Vision University of Missouri at Columbia HMM’s for sign language understanding-2 Build an HMM for each word Then build a language model Build an HMM for each word Then build a language model

CS8690 Computer Vision University of Missouri at Columbia Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.

CS8690 Computer Vision University of Missouri at Columbia HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.

CS8690 Computer Vision University of Missouri at Columbia

CS8690 Computer Vision University of Missouri at Columbia Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE