Download presentation
Presentation is loading. Please wait.
Published byHorace Wilson Modified over 9 years ago
1
Template matching and object recognition
2
CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc. Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.
3
CS8690 Computer Vision University of Missouri at Columbia Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins
4
CS8690 Computer Vision University of Missouri at Columbia Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
5
CS8690 Computer Vision University of Missouri at Columbia Probabilistic interpretation Write Assume Likelihood of image given pattern Write Assume Likelihood of image given pattern
6
CS8690 Computer Vision University of Missouri at Columbia Possible alternative strategies Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities
7
CS8690 Computer Vision University of Missouri at Columbia Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
8
CS8690 Computer Vision University of Missouri at Columbia Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
9
CS8690 Computer Vision University of Missouri at Columbia Example Training examples Test image
10
CS8690 Computer Vision University of Missouri at Columbia
11
CS8690 Computer Vision University of Missouri at Columbia
12
CS8690 Computer Vision University of Missouri at Columbia Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be
13
CS8690 Computer Vision University of Missouri at Columbia Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
14
CS8690 Computer Vision University of Missouri at Columbia Detection This means we compare
15
CS8690 Computer Vision University of Missouri at Columbia Issues Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize
16
CS8690 Computer Vision University of Missouri at Columbia Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
17
CS8690 Computer Vision University of Missouri at Columbia Pruning Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Example: finding people without clothes on –find skin –find extended skin regions –construct groups that pass local classifiers (i.e. lower arm, upper arm) –give these to broader scale classifiers (e.g. girdle)
18
CS8690 Computer Vision University of Missouri at Columbia Pruning Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries. Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries.
19
CS8690 Computer Vision University of Missouri at Columbia Horses
20
CS8690 Computer Vision University of Missouri at Columbia Hidden Markov Models Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example
21
CS8690 Computer Vision University of Missouri at Columbia Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements
22
CS8690 Computer Vision University of Missouri at Columbia HMM’s - dynamics
23
CS8690 Computer Vision University of Missouri at Columbia HMM’s - the Joint and Inference
24
CS8690 Computer Vision University of Missouri at Columbia Trellises Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log-posterior Trellis makes this easy, as follows.
25
CS8690 Computer Vision University of Missouri at Columbia
26
CS8690 Computer Vision University of Missouri at Columbia Fitting an HMM I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc. I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc.
27
CS8690 Computer Vision University of Missouri at Columbia HMM’s for sign language understanding-1 Build an HMM for each word
28
CS8690 Computer Vision University of Missouri at Columbia HMM’s for sign language understanding-2 Build an HMM for each word Then build a language model Build an HMM for each word Then build a language model
29
CS8690 Computer Vision University of Missouri at Columbia Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.
30
CS8690 Computer Vision University of Missouri at Columbia HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.
31
CS8690 Computer Vision University of Missouri at Columbia
32
CS8690 Computer Vision University of Missouri at Columbia Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.